arXiv:1501.05973v2 [q-bio.NC] 27 Jan 2015 


Journal of Machine Learning Research 1 (2015) 1-48 


Submitted 1/15; Published 1/15 


Inferring and Learning from Neuronal Correspondences 


Ashish Kapoor akapoor@microsoft.com 

Microsoft Research 
One Microsoft Way 
Redmond, WA 98052, USA 

E. Paxon Prady efrady@ucsd.edu 

Department of Neurobiology 
University of California 
San Diego, CA 92093, USA 

Stefanie Jegelka stefje@mit.edu 

Department of Electrical Engineering and Computer Scienee 
Massachusetts Institute of Technology 
Cambridge, MA 02139, USA 

William B. Kristan kristan@ucsd.edu 

Department of Neurobiology 
University of California 
San Diego, CA 92093, USA 

Eric Horvitz horvitz@microsoft.com 

Microsoft Research 
One Microsoft Way 
Redmond, WA 98052, USA 

Editor: TBD 


Abstract 


We introduce and study methods for inferring and learning from correspondences among 
neurons. The approach enables alignment of data from distinct multiunit studies of nervous 
systems. We show that the methods for inferring correspondences combine data effectively 
from cross-animal studies to make joint inferences about behavioral decision making that 
are not possible with the data from a single animal. We focus on data collection, ma¬ 
chine learning, and prediction in the representative and long-studied invertebrate nervous 
system of the European medicinal leech. Acknowledging the computational intractability 
of the general problem of identifying correspondences among neurons, we introduce effi¬ 
cient computational procedures for matching neurons across animals. The methods include 
techniques that adjust for missing cells or additional cells in the different data sets that 
may reflect biological or experimental variation. The methods highlight the value har¬ 
nessing inference and learning in new kinds of computational microscopes for multiunit 
neurobiological studies. 
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1. Introduction 


Neurobiologists have long pursued an understanding of the emergent phenomena of nervous 
systems, such as the neuronal basis for choice and behavior. Much research on neuronal 
systems grapples with the complex dynamics of interactions among multiple neurons. New 
techniques, such as calcium imaging, voltage-sensitive dye (VSD) imaging 

) and multi-unit electrode recordings, enable larger views of 
many experimental preparations, the amount of data that 
can be collected via tedious experiments is limited. As an example, data from voltage- 
sensitive dyes are time-limited because of bleaching of the dyes and also neuronal damage 
caused by phototoxicity. 

We have developed methods for combining the data from multiple experiments to pool 
data on neural function. The approach allows us to make inferences from data sets that 
are impossible to obtain from individual preparations. Coalescing the data from multiple 
experiments is an intrinsically difficult problem because of the difficulty in matching cells 
and their roles across animals. Variation is observed in nervous systems of individual animals 
based on developmental differences as well as artifacts introduced in the preparation and 
execution of experiments. Developing a means for identifying correspondences in cells across 
animals would allow data to be pooled from multiple animals supporting deeper inferences 
about neuronal circuits and behaviors. 


1999; Gonzalez and Tsien, 1995 


nervous systems. However, for 


(Cacciatore et al. 


We focus specifically on experimental studies of neurons composing the ganglia of H. 
verbana (Briggman et ah, 2005). The leech has a stereotypical nervous system consisting 
of repeating packets of about 400 neurons. About a third of these neurons have been 
identihed, and these neurons can be found reliably in different animals. The remaining two- 
thirds of neurons have yet to be identified, but are believed to maintain similar properties 
and functional roles across animals. The general problem of correspondence matching of 
the cells of two different animals is illustrated in Figure 1. We seek to identify neurons that 
are equivalent across ganglia obtained from different animals. For example, the red cell in 
animal a has several candidate correspondences in animal b, but with varying degrees of 
similarity (indicated by shades of red). Ideally, we will find a one-to-one match via jointly 
considering multiple similarities. With only two animals, this problem can be mapped to 
a bipartite graph-matching task and can be solved optimally (Munkres, 1957). However, 


we want to jointly solve the matching task for larger numbers of animals. Such matching 
across multiple graphs defined by the individual nervous systems is an intractable problem 
in the NP-hard class (Papadimitriou and Steiglitz 1982). In addition, the problem is even 
more difficult because such matching must also take into account variations in the numbers 
and properties of neurons observed in different animals. These variations can be due to 
both developmental differences (e.g., some neurons may be missing or duplicated) and 
experimental artifacts (e.g., some neurons may be out of the plane of focus or destroyed in 
the delicate dissection). A key challenge in this endeavor is the formulation of a similarity 
measure that takes into account physical parameters of cells, such as their size and location, 
as well as their functional properties. 
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(c) (d) 

Figure 1: Challenge of identifying correspondence among neurons in ganglia from different 
animals. Given compatibility constraints, a correspondence algorithm seeks a one- 
to-one mapping between neurons in two H. verbana preparations. The goal is to 
find out correspondences between cells in each of the animals. The color coding 
illustrates compatibility constraints: Feasible matches for the highlighted red, 
green, and blue cells in animal 1 (c) are found in animal 2 (d), as indicated with 
matched colors. The degree of feasibility of the matches is depicted via shading of 
cells in animal b, where the most compatible cell for each source neuron in animal 
a is highlighted with a white border. Although the figure shows only two animals, 
such compatibility constraints occigr across all pairs of the 6 animals used. 
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Table 1: The proposed algorithmic framework 


Given training data with pairs of match and 
non-match cells, estimate the parameter matrix 
Step 1: Learn Compatibility Measure A that defines the compatibility function 

between any and neurons for 
all animal pairs. 

Start with an initialized empty match set Sq. 
Iteratively determine the next best match 
Step 2: Recover Correspondence Map M.t by solving equation 

and update St = St-i U Ait- 
End when all the cells are matched. 

Construct the matrix Y that aggregates 
data from all the animals, where each 

Step 3: Infer Missing Data row corresponds to cells and are permuted 

according to the matching. Use Probabilistic 
PC A on Y to infer the missing data. 


2. Machine Learning Framework 


We use a set of neuronal data collected in Briggman et al. (2005), which consists of optical 
VSD recordings from populations of 123-148 neurons in a mid-body segmental ganglion from 
six different leeches. Earlier research on this data identified neurons involved in decision¬ 
making. In particular, the study aimed at understanding the roles of neuronal populations 
in decisions to swim or to crawl following stimulation. Sensory neurons (DP nerve) were 
stimulated in such a way that would elicit, with equal (0.5) likelihood, swimming or crawl¬ 
ing. This previous study considered single cell activations and joint analysis of neurons using 
dimensionally reduction techniques of PCA and LDA. However, these techniques were lim¬ 
ited to one animal at a time. In the current study, we propose a framework that analyzes 
data across-animals to increase the power of the analysis. 


Rather than using a handcrafted measure, we have employed a machine-learning frame¬ 
work that relies on supervised training data. This algorithm estimates an appropriate 
similarity function between neurons in different animals based on a training set of high- 
confidence correspondences. These correspondences are readily identified neurons in the 
nervous systems of H. verbana (Muller et ah, 1981). An important capability of the al¬ 
gorithm is to take into account the probabilistic nature of inferred correspondences. The 
algorithm begins by learning a weighting function of relevant features that maximizes the 
likelihood of matches within the training set. The next step of the approach is to jointly 
solve the correspondence matching problem for neurons across animals, while considering 
potential missing or extra cells in each animal. The final step is to consider correspondences 
with functions that fill in missing data. As we will demonstrate below, pooling neurophys¬ 
iological data from multiple studies in a principled manner leads to larger effective data 
with greater statistical power than the individual studies. 
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Specifically, the pipeline for the methodology includes three steps (detailed in Table : 
(1) Determining a similarity score across pairs of cells, (2) recovering correspondences that 
are consistent with the similarity measure, and (3) estimating missing data. We describe 
these steps in detail below: 

2.1 Learning Similarity Measure for Cells 

The goal in Step 1 is to learn a similarity function indicating the feasibility of a 

match between the cell in animal a and the cell in animal b. The most desirable 
characteristic for such a function is a high positive value for likely matches and diminishing 
values for poor matches. Such a characteristic is captured by the exponentiation of a 
negative distance measure among sets of features that represent multiple properties of cells. 
Formally: 


Here, 4>{-) are d dimensional feature representations for the individual cells for each 
animal and summarizes physical (e.g., size, location etc.) and functional (e.g., optical 
recordings) properties. A is a, dx d parameter matrix with positive entries that are learned 
from data. Intuitively, the negative log of the similarity function is a distance function 
between the feature representations: a zero distance between two feature vectors result in 
highest similarity measure of 1, whereas representations at further distance away in the 
feature space have a diminishing value. The matrix A parameterizes this distance measure. 
Given training data consisting of several probable pairs of matched neurons, we use A to 
solve an optimization problem. We describe the details below. 

The following list of features were used in our work: 


• Structural features: Absolute position of the cell with respect to the entire observed 
frame, relative position of cell in relation to the entire ganglion, absolute size of the cell 
in pixels, indicator vector specifying packet the neuron is located in (among Central, 
Left Anterior, Left Posterior, Right Anterior, Right Posterior or Central Posterior 
packet), and relative position coordinate of the neuron in its respective packet. 


Functional features: Coherence of electrophysiological observations with swim os¬ 
cillations and single cell discrimination time (see 
guishes from swim versus the crawl behavior. 


Briggman et al. (2005)) that distin- 


Intuitively, the optimization problem finds the parameter A that minimizes the distance 
between pairs of cells that were tagged as matches, while maximizing the distance among 
other pairs. Formally, parameter A of the compatibility measure is estimated by minimizing 
the objective: 


A* = argnhn ^ [-21og/“^(i, j)-blog ^ /“^(i,/) + log ^ /“^*',i)] (2) 

i,j i'efej'Vi 

subject to the constraint that all entries of A are positive. The sum is over all the labeled 
training pairs (i,j) tagged as likely matches. Intuitively the first term —2 log/“^(i, j) in 
the objective prefers solutions that would collapse the distance between matched pairs to 
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zero, while the rest of the terms prefer a solution where the distance between the rest of the 
cells are maximized. The optimization is straightforward and a simple gradient descent will 
always find a locally optimal solution. Note that a more appropriate constraint is positive 
semi-definite condition on A, however we suggest using a non-negativity constraint due to 
simplicity in optimization with almost no reduction in performance of the pipeline. 


2.2 Correspondence Matching 

In a second step, we calculate the correspondence matches across all the animals. Instead 
of calculating all the matches simultaneously, the framework follows an iterative procedure: 
future matches are made not only by using the similarity function, but also by comparing 
the geometric and structural relationship of the candidates to the past matches. Besides 
considering the distances induced by the similarity function (i.e. — log/“^(f, j)), and unlike 


past work on graph matching (Williams et ah, 1997 Bunke, 2000), the proposed method 


utilizes knowledge of landmarks by inducing constraints that impose topological and geo¬ 
metric invariants. This match-making algorithm considers the iteration t and denotes the 
set of already determined matches by St- The algorithm then determines the next 

set of neurons from all the animals to be matched by solving the following optimization 
task: 

-logT^iiJ) + XDLM{hj,'St) (3) 

all pairs(i,i)eA4 

Here A is the trade-off parameter that balances the compatibility measure with landmark 
distances Dlm{ ) from the matches recovered in all the prior iterations. The landmark 
distance computation provides important structural and topological constraints for solving 
the correspondence tasks. Given anchor points, the landmark distances attempt to capture 
structural and locational relationship with respect to the available landmarks. There are 


= argmin 
M 


several options such as commute distance (McKay, 1981; Lovasz, 1993) on a nearest-neighbor 
graph, or Euclidean distance computed by considering either the locations or the feature 
representation of the neurons. In our experiments, we compute landmark distances between 
neuron i in animal a and neuron j in animal b with respect to a set of anchor points S as: 


DLMii,j,S) = ^log/““(i,i') - ^ log/“(j,/). 

i'eS j'es 


( 4 ) 


The optimization problem in the above equation is solved using off-the-shelf energy mini¬ 


mization procedures (Boykov et al. 2001 Minka, 2005). The set of the newly discovered 


matches are then included and the process is repeated until all matches stay the same (set¬ 
tle). Essentially, the goal is to find a set of matched neurons across all the animals such 
that objective function is minimized. We start with a reasonable initialization of solution 
(for example by solving for consecutive pairs of animals). This solution is iteratively re¬ 
fined by considering data drawn from one study at a time and searching for a replacement 
neuron which would lower the total energy. Such replacements continue until no further 
minimization is observed. 

Utilizing landmarks are appropriate as an informative signal for matching neurons in 
the leech, because there is a typified geometric structure. Although soma positions do vary 
from animal to animal, often certain somas remain arranged with particular geometrical 
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Figure 2: Cell correspondences inferred across six H. verbana. Graphics show results of the 
correspondence matching procedure across six animals. Color coding indicates 
the correspondences, where matched cells across different animals share the same 
color. We highlight two cells (depicted as 1 and 2) and show the matches as 
lines linking neurons across the animals. Several cells remain unmatched and are 
depicted using the dashed lines (unfilled interior). The algorithm is capable of 
handling partial matches where cells are not present in all the six animals due to 
true structural differences or losses either in their preparation or in their sensing. 


relationships. For instance the Nut and AE cells typically form a box-like pattern, the N 
and T sensory neurons usually will be arranged in a hemi-circle along the packet edge, which 
often will wrap around the AP cell. These types of arrangements are useful for identification 
of cells by eye, and we extend our algorithm to utilize these relationships. 

The framework is extended to handle poor matches and missing cells by considering a 
sink cell in every animal. The sink cell has a fixed cost of matching, denoted as c, and acts 
as a threshold such that neuron matches with costs greater than c are disallowed. The sink 
cells are a soft representation of the probability that a particular neuron was not visible 
during a given preparation. 

2.3 Pooling Across Animals 

Finally in the third step, the framework reconstructs data corresponding to cells that are 
missing and remain unobserved in some animals. In particular, if we consider the elec- 
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Figure 3: Computed canonical ganglion for H. verbana derived from the correspondence 
matching algorithm. We used the results of the correspondence matching algo¬ 
rithm to generate an average or canonical ganglion by computing mean location 
and size for each cell that was matched across at least three different animals. 
The shades of neurons are colored according to the weight determined by an LDA 
projection that would distinguish between swim and crawl models (brighter color 
mean higher weight; the colors used were arbitrary). 


trophysiological activity for unobserved cells as latent random variables, then we can infer 
those latent variables by exploiting the fact that they were observed in other animals. Once 
we have correspondence information across animals, we can fill in missing electrophysio- 
logical data. Formally, we invoke data completion via Probabilistic Principle Component 
Analysis (PPCA) (Roweis, 1998 Tipping and Bishop, 1999). To apply PPCA, we construct 
a matrix Y, where each row corresponds to a neuron and each column corresponds to the 
fluorescence intensity in a short time interval. Further, since the correspondences between 
all the animals are calculated, we can stack the data from all the animals in Y such that 
the rows are arranged according to the discovered correspondences, (we use -1 to denote 
absence of data due to missing cells in an animal). The PPCA algorithm recovers the low 
dimensional structure in the data, and inserts missing data via Expectation Maximization 
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(Dempster et al., |1977 ). The PPCA algorithm starts with an initialized low-dimensional 
projection and alternates between the E-step and the M-step. The E-step is where the 
missing data is estimated by considering statistical relationships in the data. The M-step 
is where the estimates of the low-dimensional projection are further refined. 

Consider the matrix V (dimensions c x n), which consists of neuronal activity recordings 
of c cells from all the animals, is constructed using the methodology described in text above. 
We then hrst scale all the values in the matrix Y between 0 and 1. Lets denote the low¬ 
dimensional representation of the data as matrix X (dimensions k x n, where k < c) and 
the principal components as C (dimensions cxk). The PPCA algorithm first initializes the 
matrices X and C randomly and then alternates between the following two steps: 


E-step: Estimate Y = CX 

M-step: Rehne X^w = and Cnew = Y^newi^newX'^ew)~^ 


The algorithm converges when the maximum change in any of individual dimensions of 
estimates Y is less than 0.001. PPCA is guaranteed to converge so that it produces data 
completion even for neurons that are not observed in some animals. 

In our implementation, optimization for Step 1 (see Table 1) is performed via Limited 


Memory BEGS (Liu and Nocedal, 1989) routine and energy minimization for the above 
Equation is performed via iterative variational inference ( ]Beal 2003). There are three 
parameters that need to be specified in the framework: c the upper limit on cost of allowed 
matches, the trade-off parameter between compatibility and relative locality measure, and 
k the dimensionality of the low dimensional projection in PPCA. These parameters are 
determined via a cross-validation methodology. The cross validation is performed out by 
considering the aggregated matrix Y, randomly reducing 10% of the observed data, and 
considering the reconstruction error using an L2 norm on the removed data. This process is 
repeated 10 times and parameters with minimum average reconstruction error are chosen. 
The search space for parameters c and A lie in log-scale (i.e. c and A G [10“^, 10“^,.., 10®]), 
while for k we try in a linear range (i.e. k € [1, 25]). 


3. Experiments 

Training data for learning the parameter A was collected by an experimentalist (EPF) 
who hand-annotated 815 different match pairs across all the animals. Fig. 1 shows the 
resulting compatibility measure for these data. Note how the physical properties (such 
as size, relative location, packet membership) of the most likely matches (highlighted by 
a white outline) illustrate the quality of the learned function. The matching procedure 
results in a correspondence map (Fig 2.) matching neurons across the 6 different animals. 
Once the correspondence map was calculated, it was used to generate a prototypic model 
of animal by averaging physical as well as functional properties (Fig. 3). 

Because the resulting correspondence map was computed simultaneously across all the 
animals, it provides a simple way to analyze the quality of the recovered solution. In addition 
to the physical properties, the functional characteristics of any two matched neurons across 
different animals are similar across animals (Figure 4). We observed that a simple estimator 
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Figure 4: Activity of neurons in a leech ganglion from prior study (Briggman et al. 2005), 


showing how their neuronal activity can be used to identify homologous neurons 
across animals. Voltage-sensitive dye traces from two different neurons that were 
considered to be matches by the correspondence matching algorithm. The traces 
highlight that the algorithm has the capability of recovering correspondence across 
cell that are functionally similar. 


based on average activity of neurons in five animals predicted the activity of the sixth one 
(Figure 5). Here we estimated the entire time series of activity for a given cell in an animal 
by considering the activity for the corresponding cell across the rest of the five animals. Two 
different models for swim and crawl mode are computed where the prediction is performed 
by computing an average across all the observed time-series. 

For all six animals, lower differences between the observed electrophysiological activ¬ 
ity and predictions made by a model learned from rest of the animals confirm that the 
framework had recovered correct correspondences between the neurons across animals. 

Although the matching algorithm performs quite well, it is likely that the algorithm 
is far from perfect. Many of the matched cells may not be correct. Since the functional 
responses of the cells are a factor for the matching, cells with little functional signal will be 
harder to match than those with big signals. Cells lacking functional signal, however, are 
not providing a lot of information for predicting behavioral outcome. Thus, these cells likely 
have poor matches, but are also likely the cells which are non-relevant to the swim-crawl 
decision circuit. It is also possible that many cells are effectively the same given this data 
set, but the matches do not truly reffect homologous pairs. We expect many cells to be 
functionally the same and mismatching these similar cells may not hurt our analysis. 

The correspondence matching algorithm enables pooling of the data across animals, 
which allows exploration that was not feasible previously. For example, Figure 6 shows 
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Figure 5: Bar graphs that highlight the results to test the recovered correspondence us¬ 
ing a leave-one-animal-out analysis. The plots were generated by first consider¬ 
ing a candidate test animal and then building a predictive model for each cell 
(from when the animal swam or crawled) using the remaining five animals. The 
bar-chart compares mean-squared error between predicted and observed electro- 
physiological activity when matching using the proposed framework with random 
selection. The differences across all of the six leave-one-animal-out test cases are 
significant. 


a 3-dimensional projection recovered by applying ISOMAP (Tenenbaum et al. 2000), a 


non-linear dimensionality reduction method that is an extension to linear methods such as 
PCA. Because the algorithm was applied to the entire pooled data, the recovered dimensions 
are consistent across all the animals, and thus can be visualized and analyzed within the 
same reference frames. Previously application of such techniques (such as PCA and LDA 


in (Briggman et ah, 2005)) was limited to a single animal at a time resulting in dimensions 


which were incomparable across animals. 

The pooling of data enabled by methodology proved to be valuable in predictive models 
of decision making. Figure 7 shows that pooling the data across animals enable earlier 
predictions of one of the two behaviors (swimming or crawling) following stimulation than 
data from a single animal. Specifically, PCA was performed on pooled data and earliest 
discrimination time between swim and crawl was determined according the procedure de¬ 
scribed in (Briggman et al., 2005). In Figure 3, we highlight cells in the composed canonical 


ganglion that play an important role in the behavioral decisions of the animal. Combining 
data across multiple animals enables transfer and overlay of information, allowing aggrega¬ 
tion of important statistical parameters and more robust empirical models. Figure 8 shows 
ganglion maps for six animals highlighting cells that contribute most towards discrimination 
amongst the swim and the crawl trials. Note that the highly discriminative cells (towards 
the red spectrum) are consistent in physical properties such as location and size across the 
different animals. We also note that these cells are significantly different from cell 208 that 


was identified in earlier studies (Briggman et ah, 2005) 
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Animal 1 'Jv Animal 2 '»xAnimal3 



Figure 6: Using correspondences to predict behavior from neuronal activity. Identification 
of corresponding neurons across animals enables larger data sets to be constructed 
by pooling observations from multiple preparations, which in turn enable deeper 
and more accurate data analysis to address questions of interest. This figure shows 
Non-linear projections generated by applying the ISOMAP algorithm. The blue 
and red dots correspond to swim and crawl mode and depict the trajectory that 
the voltage-sensitive dye trajectories take for each animal. Note that ISOMAP 
applied for an individual animal might result in projections that are inconsistent 
across the different animals. However, using the discovered correspondences of 
neurons across animals, we combine the data from all six animals, and recover 
projections that are consistent for all of the animals. 


4. Related Work 


The work described in this paper builds upon many different sub-areas of machine learning. 
In particular, the key ingredients include metric learning, correspondence matching and 


probabilistic dimensionality reduction (Roweis 1998 Tipping and Bishop, 1999). Distance 


metric learning is a fairly active research area. Most of the work in distance metric learning 


focus on AT-Nearest Neighbor (A:-NN) classification scenario (Duda et ah, 2001) and often 


aim to learn a Mahalanobis metric that is consistent with the training data (Frome et al 


2007 

|Bar-Hillel et al. 

Weinberger et al. 

2006 

Davis et al. 

2006 


learning method employed in this paper is closest to the work of Goldberger et al. (2005) 


The distance metric 


and Globerson and Roweis ( 2006| ), but modified to just consider the sets of similar cells 
given by the user. 
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Figure 7: Bar graphs showing that pooled data allows us to discriminate between swim and 
crawl significantly earlier than what was reported earlier using a PC A analysis 
on data from a single animal (jBriggman et al., 2005). 


Correspondence problems are employed in a multitude of applications. Computer vision 
is particularly closer to our scenario. Among the simplest are transformations of rigid bod¬ 
ies, where geometry can be exploited (Goodrich and Mitchell, 1999, McAuley et ah, 2008), 
while correspondences among non-rigid objects, and between non-identical objects, can 
pose significant challenges. Algorithms applied to more general correspondence problems 
largely combine the compatibility of points by features with the local geometric compat¬ 
ibility of matches. Such models can be formulated as graphical models (McAuley et al 


2008; jTorresani et ah, 2008, Starck and Hilton 

2007) or as selecting nodes 

in an association 

rer-order criteria 
nethods consider 

graph 1 

Cho et al. 

2010 

Cour et al. 2006) 

, and have been extended to hig’ 

(Duchenne et al. 

20091 2ass and Shashua 

200^ 

h Lee et ah, 2011). Other r 

the Laplacian constructed from a neighborhood graph (Umeyama, 1988 

Escolano et al. 


2011; Mateus et al., 2008), and some models are learned from full training examples (Tor- 


resani et al.[ 2008). Closest to the idea of using reference points are approaches based on 


seed points (Sharma et al., 2011), landmarks (Jegelka et ah, 2014), coarse-to-fine strategies 


(Starck and Hilton, 2007), and on guessing points that help orient the remaining points in 


a rigid body (McAuley and Caetano, 2012) 
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5. Conclusion and Future Work 

The proposed methodology is likely to be even more useful in combination with other 
data-centric analyses. For example, the model learned from past data can be employed 
to guide future experimentation. By computing correspondences between the model and 
and data from an ongoing experiment in real-time, we can then use the model to guide 
information extraction strategies. The methodology can also be extended to perform within- 
leech analysis, such as discovering bilateral pairs of neurons. In addition, this methodology 
can readily be used to analyze the simultaneous activity of multiple neurons in other animals. 
We foresee valuable uses of the approach in overlaying data from larger nervous systems 
and, moving beyond cells, to higher-level abstractions of nervous system organization, such 
as components of retina or columns in vertebrate nervous systems. Given its simplicity and 
the appeal of potentially pooling large quantities of data, the correspondence methodology 
may find wide use in many areas of neuroscience. 
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Figure 8: Determining influential cells using linear discriminant analysis. The ganglion 
maps from 6 experiments are shown. The maps are from the same experiments 
as in Fig. 4. Cells are color-coded based on the magnitude of the contribution 
to the linear discriminant direction. Red and yellow represent large magnitude 
contributions, blue represents small contributions. We can see that there are at 
least 3 cells that are influential anHTdo not include cell 208 (marked using white 
arrow). 


























